Search CORE

89 research outputs found

SaberX4: High-throughput Software Implementationof Saber Key Encapsulation Mechanism

Author: Sujoy Sinha Roy
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 13/11/2019
Field of study

Saber is a module lattice-based CCA-secure key encapsulation mechanism (KEM) which has been shortlisted for the second round of NIST\u27s Post Quantum Cryptography Standardization project. To attain simplicity and efficiency on constrained devices, the Saber algorithm is serial by construction. However, on high-end platforms, such as modern Intel processors with AVX2 instructions, Saber achieves limited speedup using vector processing instructions due to its serial nature. In this paper we overcome the above-mentioned algorithmic bottleneck and propose a high-throughput software implementation of Saber, which we call `SaberX4\u27, targeting modern Intel processors with AVX2 vector processing support. We apply the batching technique at the highest level of the implementation hierarchy and process four Saber KEM operations simultaneously in parallel using the AVX2 vector processing instructions. Our proof-of-concept software implementation of SaberX4 achieves nearly 1.5 times higher throughput at the cost of latency degradation within acceptable margins, compared to the AVX2-optimized non-batched implementation of Saber by its authors. We anticipate that both latency and throughput of SaberX4 will improve in the future with improved computer architectures and more optimization efforts

Crossref

Cryptology ePrint Archive

Constant-time BCH Error-Correcting Code

Author: Matthew Walters
Sujoy Sinha Roy
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 16/04/2019
Field of study

Error-correcting codes can be useful in reducing decryption failure rate of several lattice-based and code-based public-key encryption schemes. Two schemes, namely LAC and HQC, in NIST’s round 2 phase of its post-quantum cryptography standardisation project use the strong error-correcting BCH code. However, direct application of the BCH code in decryption algorithms of public-key schemes could open new avenues to the attacks. For example, a recent attack exploited non-constant-time execution of BCH code to reduce the security of LAC. In this paper we analyse the BCH error-correcting code, identify computation steps that cause timing variations and design the first constant-time BCH algorithm. We implement our algorithm in software and evaluate its resistance against timing attacks by performing leakage detection tests. To study the computational overhead of the countermeasures, we integrated our constant-time BCH code in the reference and optimised implementations of the LAC scheme as a case study, and observed nearly 1.1 and 1.4 factor slowdown respectively for the CCA-secure primitive

Cryptology ePrint Archive

A tiny coprocessor for elliptic curve cryptography over the 256-bit NIST prime field

Author: Bosmans Jeroen
Jarvinen Kimmo
Roy Sujoy Sinha
Verbauwhede Ingrid
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 17/03/2016
Field of study

Crossref

University of Birmingham Research Portal

Hardware assisted fully homomorphic function evaluation and encrypted search

Author: Roy Sujoy Sinha
Verbauwhede Ingrid
Vercauteren Frederik
Vliegen Jo
Publication venue
Publication date: 01/09/2017
Field of study

University of Birmingham Research Portal

Constant-time discrete Gaussian sampling

Author: Karmakar Angshuman
Reparaz Oscar
Roy Sujoy Sinha
Verbauwhede Ingrid
Vercauteren Frederik
Publication venue
Publication date: 12/03/2018
Field of study

© 2018 IEEE. Sampling from a discrete Gaussian distribution is an indispensable part of lattice-based cryptography. Several recent works have shown that the timing leakage from a non-constant-time implementation of the discrete Gaussian sampling algorithm could be exploited to recover the secret. In this paper, we propose a constant-time implementation of the Knuth-Yao random walk algorithm for performing constant-time discrete Gaussian sampling. Since the random walk is dictated by a set of input random bits, we can express the generated sample as a function of the input random bits. Hence, our constant-time implementation expresses the unique mapping of the input random-bits to the output sample-bits as a Boolean expression of the random-bits. We use bit-slicing to generate multiple samples in batches and thus increase the throughput of our constant-time sampling manifold. Our experiments on an Intel i7-Broadwell processor show that our method can be as much as 2.4 times faster than the constant-time implementation of cumulative distribution table based sampling and consumes exponentially less memory than the Knuth-Yao algorithm with shuffling for a similar level of security

University of Birmingham Research Portal

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Arithmetic of $\tau$ -adic Expansions for Lightweight Koblitz Curve Cryptography

Author: Järvinen Kimmo
Roy Sujoy Sinha
Verbauwhede Ingrid
Publication venue
Publication date: 01/11/2018
Field of study

Koblitz curves allow very efficient elliptic curve cryptography. The reason is that one can trade expensive point doublings to cheap Frobenius endomorphisms by representing the scalar as a tau-adic expansion. Typically elliptic curve cryptosystems, such as ECDSA, also require the scalar as an integer. This results in a need for conversions between integers and the tau-adic domain, which are costly and hinder the use of Koblitz curves on very constrained devices, such as RFID tags, wireless sensors, or certain applications of the Internet of things. We provide solutions to this problem by showing how complete cryptographic processes, such as ECDSA signing, can be completed in the tau-adic domain with very few resources. This allows outsourcing conversions to a more powerful party. We provide several algorithms for performing arithmetic operations in the tau-adic domain. In particular, we introduce a new representation allowing more efficient and secure computations compared to the algorithms available in the preliminary version of this work from CARDIS 2014. We also provide datapath extensions with different speed and side-channel resistance properties that require areas from less than one hundred to a few hundred gate equivalents on 0.13-mu m CMOS. These extensions are applicable for all Koblitz curves.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

Exploring RNS for Isogeny-based Cryptography

Author: Ahmet Can Mert
David Jacquemin
Sujoy Sinha Roy
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 22/12/2022
Field of study

Isogeny-based cryptography suffers from a long-running time due to its requirement of a great amount of large integer arithmetic. The Residue Number System (RNS) can compensate for that drawback by making computation more efficient via parallelism. However, performing a modular reduction by a large prime which is not part of the RNS base is very expensive. In this paper, we propose a new fast and efficient modular reduction algorithm using RNS. Also, we evaluate our modular reduction method by realizing a cryptoprocessor for isogeny-based SIDH key exchange. On a Xilinx Ultrascale+ FPGA, the proposed cryptoprocessor consumes 151,009 LUTs, 143,171 FFs and 1,056 DSPs. It achieves 250 MHz clock frequency and finishes the key exchange for SIDH in 3.8 and 4.9 ms

Cryptology ePrint Archive

Additively homomorphic ring-LWE masking

Author: de Clercq Ruan
Reparaz Oscar
Roy Sujoy Sinha
Verbauwhede Ingrid
Vercauteren Frederik
Publication venue: Springer Verlag
Publication date: 04/02/2016
Field of study

University of Birmingham Research Portal

PROTEUS: A Tool to generate pipelined Number Theoretic Transform Architectures for FHE and ZKP applications

Author: Ahmet Can Mert
Florian Hirner
Sujoy Sinha Roy
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 25/02/2023
Field of study

Emerging cryptographic algorithms such as fully homomorphic encryption (FHE) and zero-knowledge proof (ZKP) perform arithmetic involving very large polynomials. One fundamental and time-consuming polynomial operation is the Number theoretic transform (NTT) which is a generalization of the fast Fourier transform. Hardware platforms such as FPGAs could be used to accelerate the NTTs in FHE and ZKP protocols. One major problem is that the FHE and ZKP protocols require different parameter sets, e.g., polynomial degree and coefficient size, depending on their applications. Therefore, a basic research question is: How to design scalable hardware architectures for accelerating NTTs in the FHE and ZKP protocols? In this paper, we present ‘PROTEUS’, an open-source and parametric tool that generates synthesizable bandwidth-efficient NTT architectures for user-specified parameter sets. The architectures can be tuned to utilize different memory bandwidths and parameters which is a very important design requirement in both FHE and ZKP protocols. The generated NTT architectures show a significant performance speedup compared to similar NTT architectures on FPGA. Further comparisons with state-of-the-art show a reduction of up to 23% and 35% in terms of DSP and BRAM utilization

Cryptology ePrint Archive

Teaching HW/SW codesign with a Zynq ARM/FPGA SoC

Author: Balasch Josep
Beckers Arthur
Bozilov Dusan
Sinha Roy Sujoy
Turan Furkan
Verbauwhede Ingrid
Publication venue: IEEE Computer Society Press
Publication date: 24/09/2018
Field of study

© 2017 IEEE. In this paper we describe a lab session-based hardware/software (HW/SW) codesign course for implementing embedded systems. The goals of the course are to teach the fundamental concepts of embedded system design, develop hands-on HW/SW codesign skills, and to show that there are many possible ways to explore the design space. The reason behind choosing HW/SW codesign approach is that it brings the best of the two worlds: the flexibility of SW and the power/energy/computation efficiency of HW. As an example project, students codesign the well-known RSA public-key cryptosystem in the Xilinx Zybo boards that contain a Xilinx 7-series FPGA coupled with an embedded ARM processing unit. Students are required to explore the design space, weigh the various alternatives and take design decisions. Besides, the project cultivates non-technical skills such as team building and management, sharing of work-load, decision making, presentation and technical report writing

Crossref

University of Birmingham Research Portal

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY